{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train a linear regression model\n",
    "When you have your data prepared you can train a model.\n",
    "\n",
    "There are multiple libraries and methods you can call to train models. In this notebook we will use the **LinearRegression** model in the **scikit-learn** library"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need our DataFrame, with data loaded, all the rows with null values removed, and the features and labels split into the separate training and test data. So, we'll start by just rerunning the commands from the previous notebooks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load our data from the csv file\n",
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n",
    "\n",
    "# Remove rows with null values since those will crash our linear regression model training\n",
    "delays_df.dropna(inplace=True)\n",
    "\n",
    "# Move our features into the X DataFrame\n",
    "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n",
    "\n",
    "# Move our labels into the y DataFrame\n",
    "y = delays_df.loc[:,['ARR_DELAY']] \n",
    "\n",
    "# Split our data into test and training DataFrames\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "                                                    X, \n",
    "                                                    y, \n",
    "                                                    test_size=0.3, \n",
    "                                                    random_state=42\n",
    "                                                   )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use **Scikitlearn LinearRegression** *fit* method to train a linear regression model based on the training data stored in X_train and y_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "regressor = LinearRegression()     # Create a scikit learn LinearRegression object\n",
    "regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The *regressor* object now contains your trained Linear Regression model"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
